Overview

Brought to you by YData

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells435424
Missing cells (%)8.1%7.9%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Sex is highly overall correlated with SurvivedSex is highly overall correlated with SurvivedHigh correlation
Survived is highly overall correlated with SexSurvived is highly overall correlated with SexHigh correlation
Age has 92 (20.6%) missing values Age has 89 (20.0%) missing values Missing
Cabin has 343 (76.9%) missing values Cabin has 335 (75.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 293 (65.7%) zeros SibSp has 300 (67.3%) zeros Zeros
Parch has 337 (75.6%) zeros Parch has 340 (76.2%) zeros Zeros
Fare has 9 (2.0%) zeros Fare has 8 (1.8%) zeros Zeros
Alert not present in this datasetFare is highly overall correlated with PclassHigh correlation
Alert not present in this datasetPclass is highly overall correlated with FareHigh correlation

Reproduction

 Dataset ADataset B
Analysis started2025-03-26 00:45:09.0425042025-03-26 00:45:11.072139
Analysis finished2025-03-26 00:45:11.0692392025-03-26 00:45:13.163211
Duration2.03 seconds2.09 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean451.32511438.56502
 Dataset ADataset B
Minimum11
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:13.260596image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum11
5-th percentile44.2537.25
Q1241.25218.5
median453.5434.5
Q3667667.5
95-th percentile854.5840.75
Maximum891891
Range890890
Interquartile range (IQR)425.75449

Descriptive statistics

 Dataset ADataset B
Standard deviation253.27572258.36105
Coefficient of variation (CV)0.561182410.58910545
Kurtosis-1.1364427-1.1829911
Mean451.32511438.56502
Median Absolute Deviation (MAD)213225.5
Skewness-0.0172346130.055845698
Sum201291195600
Variance64148.58866750.431
MonotonicityNot monotonicNot monotonic
2025-03-26T00:45:13.396436image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
664 1
 
0.2%
83 1
 
0.2%
479 1
 
0.2%
85 1
 
0.2%
11 1
 
0.2%
146 1
 
0.2%
116 1
 
0.2%
341 1
 
0.2%
360 1
 
0.2%
336 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
96 1
 
0.2%
566 1
 
0.2%
696 1
 
0.2%
240 1
 
0.2%
462 1
 
0.2%
560 1
 
0.2%
622 1
 
0.2%
632 1
 
0.2%
384 1
 
0.2%
218 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
14 1
0.2%
15 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
4 1
0.2%
5 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
13 1
0.2%
14 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
3 1
0.2%
5 1
0.2%
7 1
0.2%
8 1
0.2%
11 1
0.2%
14 1
0.2%
15 1
0.2%
20 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
283 
1
163 
0
269 
1
177 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row10
2nd row00
3rd row10
4th row10
5th row01

Common Values

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Length

2025-03-26T00:45:13.492493image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-26T00:45:13.537148image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:13.568563image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Most occurring characters

ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 283
63.5%
1 163
36.5%
ValueCountFrequency (%)
0 269
60.3%
1 177
39.7%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
250 
1
104 
2
92 
3
238 
1
115 
2
93 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row33
2nd row32
3rd row22
4th row33
5th row23

Common Values

ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Length

2025-03-26T00:45:13.621136image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-26T00:45:13.667404image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:13.707261image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Most occurring characters

ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 250
56.1%
1 104
23.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 238
53.4%
1 115
25.8%
2 93
 
20.9%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:14.043388image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length4948
Mean length26.54708527.414798
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1184012227
Distinct characters6059
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowMcDermott, Miss. Brigdet DeliaDavies, Mr. Alfred J
2nd rowKarlsson, Mr. Nils AugustChapman, Mr. Charles Henry
3rd rowIlett, Miss. BerthaHunt, Mr. George Henry
4th rowSandstrom, Miss. Marguerite RutMorley, Mr. William
5th rowNicholls, Mr. Joseph Charlesde Messemaeker, Mrs. Guillaume Joseph (Emma)
ValueCountFrequency (%)
mr 257
 
14.3%
miss 95
 
5.3%
mrs 60
 
3.3%
william 30
 
1.7%
master 22
 
1.2%
john 21
 
1.2%
henry 16
 
0.9%
james 13
 
0.7%
george 12
 
0.7%
charles 12
 
0.7%
Other values (866) 1259
70.1%
ValueCountFrequency (%)
mr 261
 
14.2%
miss 83
 
4.5%
mrs 70
 
3.8%
william 30
 
1.6%
master 21
 
1.1%
john 19
 
1.0%
henry 18
 
1.0%
george 15
 
0.8%
charles 12
 
0.7%
joseph 11
 
0.6%
Other values (898) 1300
70.7%
2025-03-26T00:45:14.539186image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1351
 
11.4%
r 944
 
8.0%
e 852
 
7.2%
a 822
 
6.9%
i 649
 
5.5%
n 637
 
5.4%
s 637
 
5.4%
M 560
 
4.7%
l 530
 
4.5%
o 503
 
4.2%
Other values (50) 4355
36.8%
ValueCountFrequency (%)
1395
 
11.4%
r 1026
 
8.4%
e 897
 
7.3%
a 819
 
6.7%
n 666
 
5.4%
i 650
 
5.3%
s 631
 
5.2%
M 547
 
4.5%
o 542
 
4.4%
l 535
 
4.4%
Other values (49) 4519
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 11840
100.0%
ValueCountFrequency (%)
(unknown) 12227
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1351
 
11.4%
r 944
 
8.0%
e 852
 
7.2%
a 822
 
6.9%
i 649
 
5.5%
n 637
 
5.4%
s 637
 
5.4%
M 560
 
4.7%
l 530
 
4.5%
o 503
 
4.2%
Other values (50) 4355
36.8%
ValueCountFrequency (%)
1395
 
11.4%
r 1026
 
8.4%
e 897
 
7.3%
a 819
 
6.7%
n 666
 
5.4%
i 650
 
5.3%
s 631
 
5.2%
M 547
 
4.5%
o 542
 
4.4%
l 535
 
4.4%
Other values (49) 4519
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 11840
100.0%
ValueCountFrequency (%)
(unknown) 12227
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1351
 
11.4%
r 944
 
8.0%
e 852
 
7.2%
a 822
 
6.9%
i 649
 
5.5%
n 637
 
5.4%
s 637
 
5.4%
M 560
 
4.7%
l 530
 
4.5%
o 503
 
4.2%
Other values (50) 4355
36.8%
ValueCountFrequency (%)
1395
 
11.4%
r 1026
 
8.4%
e 897
 
7.3%
a 819
 
6.7%
n 666
 
5.4%
i 650
 
5.3%
s 631
 
5.2%
M 547
 
4.5%
o 542
 
4.4%
l 535
 
4.4%
Other values (49) 4519
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 11840
100.0%
ValueCountFrequency (%)
(unknown) 12227
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1351
 
11.4%
r 944
 
8.0%
e 852
 
7.2%
a 822
 
6.9%
i 649
 
5.5%
n 637
 
5.4%
s 637
 
5.4%
M 560
 
4.7%
l 530
 
4.5%
o 503
 
4.2%
Other values (50) 4355
36.8%
ValueCountFrequency (%)
1395
 
11.4%
r 1026
 
8.4%
e 897
 
7.3%
a 819
 
6.7%
n 666
 
5.4%
i 650
 
5.3%
s 631
 
5.2%
M 547
 
4.5%
o 542
 
4.4%
l 535
 
4.4%
Other values (49) 4519
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
290 
female
156 
male
292 
female
154 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.69955164.690583
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20962092
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowfemalemale
2nd rowmalemale
3rd rowfemalemale
4th rowfemalemale
5th rowmalefemale

Common Values

ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%
ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%

Length

2025-03-26T00:45:14.626300image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-26T00:45:14.678916image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:14.710251image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
male 290
65.0%
female 156
35.0%
ValueCountFrequency (%)
male 292
65.5%
female 154
34.5%

Most occurring characters

ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2092
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2092
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2096
100.0%
ValueCountFrequency (%)
(unknown) 2092
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 602
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 156
 
7.4%
ValueCountFrequency (%)
e 600
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 154
 
7.4%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7171
Distinct (%)20.1%19.9%
Missing9289
Missing (%)20.6%20.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.28107329.234118
 Dataset ADataset B
Minimum0.670.75
Maximum7180
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:14.803589image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.670.75
5-th percentile3.654
Q12020
median2928
Q337.7538
95-th percentile57.3554.2
Maximum7180
Range70.3379.25
Interquartile range (IQR)17.7518

Descriptive statistics

 Dataset ADataset B
Standard deviation14.66201314.597747
Coefficient of variation (CV)0.500733460.49933942
Kurtosis-0.00332822490.12905908
Mean29.28107329.234118
Median Absolute Deviation (MAD)99
Skewness0.282059240.3941794
Sum10365.510436.58
Variance214.97463213.09422
MonotonicityNot monotonicNot monotonic
2025-03-26T00:45:15.074914image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
24 15
 
3.4%
30 15
 
3.4%
36 14
 
3.1%
28 13
 
2.9%
22 13
 
2.9%
19 12
 
2.7%
32 12
 
2.7%
18 12
 
2.7%
21 10
 
2.2%
29 10
 
2.2%
Other values (61) 228
51.1%
(Missing) 92
20.6%
ValueCountFrequency (%)
22 16
 
3.6%
24 14
 
3.1%
36 14
 
3.1%
21 14
 
3.1%
18 14
 
3.1%
27 12
 
2.7%
19 12
 
2.7%
30 12
 
2.7%
25 11
 
2.5%
16 10
 
2.2%
Other values (61) 228
51.1%
(Missing) 89
 
20.0%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
 
0.7%
2 7
1.6%
3 4
0.9%
4 8
1.8%
5 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 2
 
0.4%
4 7
1.6%
5 3
0.7%
6 1
 
0.2%
7 2
 
0.4%
8 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
1 4
0.9%
2 6
1.3%
3 2
 
0.4%
4 7
1.6%
5 3
0.7%
6 1
 
0.2%
7 2
 
0.4%
8 3
0.7%
ValueCountFrequency (%)
0.67 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
1 3
 
0.7%
2 7
1.6%
3 4
0.9%
4 8
1.8%
5 2
 
0.4%
7 2
 
0.4%
8 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.524663680.55156951
 Dataset ADataset B
Minimum00
Maximum88
Zeros293300
Zeros (%)65.7%67.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:15.164368image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22.75
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.986685181.1537081
Coefficient of variation (CV)1.88060512.0916822
Kurtosis12.45516416.872397
Mean0.524663680.55156951
Median Absolute Deviation (MAD)00
Skewness3.04008493.6352761
Sum234246
Variance0.973547641.3310425
MonotonicityNot monotonicNot monotonic
2025-03-26T00:45:15.228283image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 293
65.7%
1 116
 
26.0%
2 15
 
3.4%
4 9
 
2.0%
3 8
 
1.8%
5 4
 
0.9%
8 1
 
0.2%
ValueCountFrequency (%)
0 300
67.3%
1 106
 
23.8%
2 17
 
3.8%
4 9
 
2.0%
3 6
 
1.3%
8 4
 
0.9%
5 4
 
0.9%
ValueCountFrequency (%)
0 293
65.7%
1 116
 
26.0%
2 15
 
3.4%
3 8
 
1.8%
4 9
 
2.0%
5 4
 
0.9%
8 1
 
0.2%
ValueCountFrequency (%)
0 300
67.3%
1 106
 
23.8%
2 17
 
3.8%
3 6
 
1.3%
4 9
 
2.0%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 300
67.3%
1 106
 
23.8%
2 17
 
3.8%
3 6
 
1.3%
4 9
 
2.0%
5 4
 
0.9%
8 4
 
0.9%
ValueCountFrequency (%)
0 293
65.7%
1 116
 
26.0%
2 15
 
3.4%
3 8
 
1.8%
4 9
 
2.0%
5 4
 
0.9%
8 1
 
0.2%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct66
Distinct (%)1.3%1.3%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.396860990.3632287
 Dataset ADataset B
Minimum00
Maximum56
Zeros337340
Zeros (%)75.6%76.2%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:15.286920image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum56
Range56
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.822104190.76313509
Coefficient of variation (CV)2.07151682.1009769
Kurtosis8.171406311.296429
Mean0.396860990.3632287
Median Absolute Deviation (MAD)00
Skewness2.58811532.8149787
Sum177162
Variance0.675855290.58237517
MonotonicityNot monotonicNot monotonic
2025-03-26T00:45:15.349412image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 337
75.6%
1 58
 
13.0%
2 43
 
9.6%
5 3
 
0.7%
4 3
 
0.7%
3 2
 
0.4%
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 41
 
9.2%
4 2
 
0.4%
6 1
 
0.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 58
 
13.0%
2 43
 
9.6%
3 2
 
0.4%
4 3
 
0.7%
5 3
 
0.7%
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 41
 
9.2%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 340
76.2%
1 61
 
13.7%
2 41
 
9.2%
4 2
 
0.4%
5 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 58
 
13.0%
2 43
 
9.6%
3 2
 
0.4%
4 3
 
0.7%
5 3
 
0.7%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct384383
Distinct (%)86.1%85.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:15.743737image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.74663686.6076233
Min length33

Characters and Unicode

 Dataset ADataset B
Total characters30092947
Distinct characters3235
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique338333 ?
Unique (%)75.8%74.7%

Sample

 Dataset ADataset B
1st row330932A/4 48871
2nd row350060248731
3rd rowSO/C 14885SCO/W 1585
4th rowPP 9549364506
5th rowC.A. 33112345572
ValueCountFrequency (%)
pc 31
 
5.5%
c.a 14
 
2.5%
2 8
 
1.4%
ston/o 8
 
1.4%
a/5 7
 
1.2%
sc/paris 6
 
1.1%
ca 6
 
1.1%
w./c 5
 
0.9%
a/4 4
 
0.7%
ston/o2 4
 
0.7%
Other values (401) 472
83.5%
ValueCountFrequency (%)
pc 33
 
5.9%
c.a 13
 
2.3%
ca 10
 
1.8%
a/5 8
 
1.4%
3101295 5
 
0.9%
2144 5
 
0.9%
ston/o 5
 
0.9%
2 5
 
0.9%
w./c 5
 
0.9%
sc/paris 4
 
0.7%
Other values (403) 470
83.5%
2025-03-26T00:45:16.243200image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 379
12.6%
1 342
11.4%
2 312
10.4%
6 222
 
7.4%
7 220
 
7.3%
0 216
 
7.2%
4 213
 
7.1%
5 194
 
6.4%
9 176
 
5.8%
8 141
 
4.7%
Other values (22) 594
19.7%
ValueCountFrequency (%)
1 362
12.3%
3 356
12.1%
2 289
9.8%
7 240
8.1%
4 237
8.0%
6 205
 
7.0%
5 189
 
6.4%
0 183
 
6.2%
9 176
 
6.0%
8 147
 
5.0%
Other values (25) 563
19.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 2947
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 379
12.6%
1 342
11.4%
2 312
10.4%
6 222
 
7.4%
7 220
 
7.3%
0 216
 
7.2%
4 213
 
7.1%
5 194
 
6.4%
9 176
 
5.8%
8 141
 
4.7%
Other values (22) 594
19.7%
ValueCountFrequency (%)
1 362
12.3%
3 356
12.1%
2 289
9.8%
7 240
8.1%
4 237
8.0%
6 205
 
7.0%
5 189
 
6.4%
0 183
 
6.2%
9 176
 
6.0%
8 147
 
5.0%
Other values (25) 563
19.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 2947
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 379
12.6%
1 342
11.4%
2 312
10.4%
6 222
 
7.4%
7 220
 
7.3%
0 216
 
7.2%
4 213
 
7.1%
5 194
 
6.4%
9 176
 
5.8%
8 141
 
4.7%
Other values (22) 594
19.7%
ValueCountFrequency (%)
1 362
12.3%
3 356
12.1%
2 289
9.8%
7 240
8.1%
4 237
8.0%
6 205
 
7.0%
5 189
 
6.4%
0 183
 
6.2%
9 176
 
6.0%
8 147
 
5.0%
Other values (25) 563
19.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 3009
100.0%
ValueCountFrequency (%)
(unknown) 2947
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 379
12.6%
1 342
11.4%
2 312
10.4%
6 222
 
7.4%
7 220
 
7.3%
0 216
 
7.2%
4 213
 
7.1%
5 194
 
6.4%
9 176
 
5.8%
8 141
 
4.7%
Other values (22) 594
19.7%
ValueCountFrequency (%)
1 362
12.3%
3 356
12.1%
2 289
9.8%
7 240
8.1%
4 237
8.0%
6 205
 
7.0%
5 189
 
6.4%
0 183
 
6.2%
9 176
 
6.0%
8 147
 
5.0%
Other values (25) 563
19.1%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct176183
Distinct (%)39.5%41.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.43691931.627251
 Dataset ADataset B
Minimum00
Maximum512.3292263
Zeros98
Zeros (%)2.0%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:16.362630image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.07197.2292
Q17.89587.925
median13.514.8729
Q331.27534.5844
95-th percentile120108.9
Maximum512.3292263
Range512.3292263
Interquartile range (IQR)23.379226.6594

Descriptive statistics

 Dataset ADataset B
Standard deviation49.69253342.049652
Coefficient of variation (CV)1.53197451.3295386
Kurtosis25.46247612.298283
Mean32.43691931.627251
Median Absolute Deviation (MAD)6.257.3375
Skewness4.18355683.2090239
Sum14466.86614105.754
Variance2469.34791768.1732
MonotonicityNot monotonicNot monotonic
2025-03-26T00:45:16.500381image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 23
 
5.2%
26 20
 
4.5%
8.05 20
 
4.5%
7.8958 18
 
4.0%
7.75 13
 
2.9%
10.5 13
 
2.9%
7.925 12
 
2.7%
7.775 10
 
2.2%
0 9
 
2.0%
8.6625 9
 
2.0%
Other values (166) 299
67.0%
ValueCountFrequency (%)
7.8958 23
 
5.2%
13 21
 
4.7%
8.05 21
 
4.7%
26 13
 
2.9%
7.75 13
 
2.9%
7.925 10
 
2.2%
10.5 10
 
2.2%
0 8
 
1.8%
7.2292 8
 
1.8%
26.55 7
 
1.6%
Other values (173) 312
70.0%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.975 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.225 4
0.9%
7.2292 8
1.8%
ValueCountFrequency (%)
0 8
1.8%
4.0125 1
 
0.2%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 1
 
0.2%
6.975 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.225 4
0.9%
7.2292 8
1.8%
ValueCountFrequency (%)
0 9
2.0%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
7.0542 2
 
0.4%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8991
Distinct (%)86.4%82.0%
Missing343335
Missing (%)76.9%75.1%
Memory size7.0 KiB7.0 KiB
2025-03-26T00:45:16.884279image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1515
Median length33
Mean length3.76699033.6756757
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters388408
Distinct characters1819
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique8073 ?
Unique (%)77.7%65.8%

Sample

 Dataset ADataset B
1st rowG6D19
2nd rowF2B57 B59 B63 B66
3rd rowA16E44
4th rowB18E24
5th rowE101C47
ValueCountFrequency (%)
g6 4
 
3.2%
b96 4
 
3.2%
b98 4
 
3.2%
c23 3
 
2.4%
c25 3
 
2.4%
c27 3
 
2.4%
e101 2
 
1.6%
c92 2
 
1.6%
c65 2
 
1.6%
d36 2
 
1.6%
Other values (92) 95
76.6%
ValueCountFrequency (%)
b96 3
 
2.3%
b98 3
 
2.3%
g6 3
 
2.3%
f 3
 
2.3%
b59 2
 
1.5%
b63 2
 
1.5%
b66 2
 
1.5%
b57 2
 
1.5%
b18 2
 
1.5%
b20 2
 
1.5%
Other values (94) 109
82.0%
2025-03-26T00:45:17.331288image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 41
10.6%
C 40
10.3%
6 33
 
8.5%
2 31
 
8.0%
1 30
 
7.7%
3 30
 
7.7%
5 26
 
6.7%
21
 
5.4%
9 21
 
5.4%
8 21
 
5.4%
Other values (8) 94
24.2%
ValueCountFrequency (%)
B 39
 
9.6%
2 39
 
9.6%
C 38
 
9.3%
1 36
 
8.8%
3 30
 
7.4%
6 29
 
7.1%
5 24
 
5.9%
22
 
5.4%
8 21
 
5.1%
9 20
 
4.9%
Other values (9) 110
27.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 388
100.0%
ValueCountFrequency (%)
(unknown) 408
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 41
10.6%
C 40
10.3%
6 33
 
8.5%
2 31
 
8.0%
1 30
 
7.7%
3 30
 
7.7%
5 26
 
6.7%
21
 
5.4%
9 21
 
5.4%
8 21
 
5.4%
Other values (8) 94
24.2%
ValueCountFrequency (%)
B 39
 
9.6%
2 39
 
9.6%
C 38
 
9.3%
1 36
 
8.8%
3 30
 
7.4%
6 29
 
7.1%
5 24
 
5.9%
22
 
5.4%
8 21
 
5.1%
9 20
 
4.9%
Other values (9) 110
27.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 388
100.0%
ValueCountFrequency (%)
(unknown) 408
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 41
10.6%
C 40
10.3%
6 33
 
8.5%
2 31
 
8.0%
1 30
 
7.7%
3 30
 
7.7%
5 26
 
6.7%
21
 
5.4%
9 21
 
5.4%
8 21
 
5.4%
Other values (8) 94
24.2%
ValueCountFrequency (%)
B 39
 
9.6%
2 39
 
9.6%
C 38
 
9.3%
1 36
 
8.8%
3 30
 
7.4%
6 29
 
7.1%
5 24
 
5.9%
22
 
5.4%
8 21
 
5.1%
9 20
 
4.9%
Other values (9) 110
27.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 388
100.0%
ValueCountFrequency (%)
(unknown) 408
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 41
10.6%
C 40
10.3%
6 33
 
8.5%
2 31
 
8.0%
1 30
 
7.7%
3 30
 
7.7%
5 26
 
6.7%
21
 
5.4%
9 21
 
5.4%
8 21
 
5.4%
Other values (8) 94
24.2%
ValueCountFrequency (%)
B 39
 
9.6%
2 39
 
9.6%
C 38
 
9.3%
1 36
 
8.8%
3 30
 
7.4%
6 29
 
7.1%
5 24
 
5.9%
22
 
5.4%
8 21
 
5.1%
9 20
 
4.9%
Other values (9) 110
27.0%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
S
317 
C
88 
Q
41 
S
340 
C
74 
Q
 
32

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowQS
2nd rowSS
3rd rowSS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 317
71.1%
C 88
 
19.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 340
76.2%
C 74
 
16.6%
Q 32
 
7.2%

Length

2025-03-26T00:45:17.408826image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2025-03-26T00:45:17.454846image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:17.495540image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
ValueCountFrequency (%)
s 317
71.1%
c 88
 
19.7%
q 41
 
9.2%
ValueCountFrequency (%)
s 340
76.2%
c 74
 
16.6%
q 32
 
7.2%

Most occurring characters

ValueCountFrequency (%)
S 317
71.1%
C 88
 
19.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 340
76.2%
C 74
 
16.6%
Q 32
 
7.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 317
71.1%
C 88
 
19.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 340
76.2%
C 74
 
16.6%
Q 32
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 317
71.1%
C 88
 
19.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 340
76.2%
C 74
 
16.6%
Q 32
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 317
71.1%
C 88
 
19.7%
Q 41
 
9.2%
ValueCountFrequency (%)
S 340
76.2%
C 74
 
16.6%
Q 32
 
7.2%

Interactions

Dataset A

2025-03-26T00:45:10.550618image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.516323image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.270542image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.289537image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.560146image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.582464image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.859919image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.895386image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.168439image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.214797image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.605484image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.572295image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.329153image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.344358image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.620082image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.642973image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.920388image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.958695image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.320688image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.271569image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.664962image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.760910image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.386761image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.407505image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.681673image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.709163image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.980306image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.020992image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.378721image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.334481image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.725986image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.821739image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.449319image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.470397image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.741041image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.770236image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.045845image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.088423image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.439915image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.397793image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.783086image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.880808image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.504032image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.526686image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:09.800115image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:11.833241image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.106391image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.150775image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

2025-03-26T00:45:10.493959image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:12.456822image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Correlations

Dataset A

2025-03-26T00:45:17.540100image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset B

2025-03-26T00:45:17.637989image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/

Dataset A

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.0240.112-0.3040.0690.2560.000-0.2500.178
Embarked0.0241.0000.1860.0000.0000.2780.1190.0820.159
Fare0.1120.1861.0000.408-0.0420.4970.1860.4650.327
Parch-0.3040.0000.4081.000-0.0150.0000.2850.4720.258
PassengerId0.0690.000-0.042-0.0151.0000.0490.086-0.0920.097
Pclass0.2560.2780.4970.0000.0491.0000.1510.1390.326
Sex0.0000.1190.1860.2850.0860.1511.0000.1940.589
SibSp-0.2500.0820.4650.472-0.0920.1390.1941.0000.239
Survived0.1780.1590.3270.2580.0970.3260.5890.2391.000

Dataset B

AgeEmbarkedFareParchPassengerIdPclassSexSibSpSurvived
Age1.0000.1490.113-0.2800.0520.2910.059-0.1550.111
Embarked0.1491.0000.1920.0000.0610.2190.0730.0000.099
Fare0.1130.1921.0000.409-0.0440.5210.1910.4840.248
Parch-0.2800.0000.4091.000-0.0860.0000.2730.4490.162
PassengerId0.0520.061-0.044-0.0861.0000.0400.084-0.0810.202
Pclass0.2910.2190.5210.0000.0401.0000.1480.1420.371
Sex0.0590.0730.1910.2730.0840.1481.0000.1910.552
SibSp-0.1550.0000.4840.449-0.0810.1420.1911.0000.181
Survived0.1110.0990.2480.1620.2020.3710.5520.1811.000

Missing values

Dataset A

2025-03-26T00:45:10.874443image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2025-03-26T00:45:12.976241image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2025-03-26T00:45:10.952762image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2025-03-26T00:45:13.053259image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2025-03-26T00:45:11.032210image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2025-03-26T00:45:13.127473image/svg+xmlMatplotlib v3.10.0, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
828313McDermott, Miss. Brigdet DeliafemaleNaN003309327.7875NaNQ
47847903Karlsson, Mr. Nils Augustmale22.0003500607.5208NaNS
848512Ilett, Miss. Berthafemale17.000SO/C 1488510.5000NaNS
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
14514602Nicholls, Mr. Joseph Charlesmale19.011C.A. 3311236.7500NaNS
11511603Pekoniemi, Mr. Edvardmale21.000STON/O 2. 31012947.9250NaNS
34034112Navratil, Master. Edmond Rogermale2.01123008026.0000F2S
35936013Mockler, Miss. Helen Mary "Ellie"femaleNaN003309807.8792NaNQ
33533603Denkoff, Mr. MittomaleNaN003492257.8958NaNS
58458503Paulner, Mr. UschermaleNaN0034118.7125NaNC

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
56556603Davies, Mr. Alfred Jmale24.020A/4 4887124.1500NaNS
69569602Chapman, Mr. Charles Henrymale52.00024873113.5000NaNS
23924002Hunt, Mr. George Henrymale33.000SCO/W 158512.2750NaNS
46146203Morley, Mr. Williammale34.0003645068.0500NaNS
55956013de Messemaeker, Mrs. Guillaume Joseph (Emma)female36.01034557217.4000NaNS
62162211Kimball, Mr. Edwin Nelson Jrmale42.0101175352.5542D19S
63163203Lundahl, Mr. Johan Svenssonmale51.0003477437.0542NaNS
38338411Holverson, Mrs. Alexander Oskar (Mary Aline Towner)female35.01011378952.0000NaNS
21721802Jacobsohn, Mr. Sidney Samuelmale42.01024384727.0000NaNS
31131211Ryerson, Miss. Emily Boriefemale18.022PC 17608262.3750B57 B59 B63 B66C

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
34234302Collander, Mr. Erik Gustafmale28.00024874013.0000NaNS
26026103Smith, Mr. ThomasmaleNaN003844617.7500NaNQ
41141203Hart, Mr. HenrymaleNaN003941406.8583NaNQ
83283303Saad, Mr. AminmaleNaN0026717.2292NaNC
54955012Davies, Master. John Morgan Jrmale8.011C.A. 3311236.7500NaNS
64965013Stanley, Miss. Amy Zillah Elsiefemale23.000CA. 23147.5500NaNS
39839902Pain, Dr. Alfredmale23.00024427810.5000NaNS
47047103Keefe, Mr. ArthurmaleNaN003235927.2500NaNS
74374403McNamee, Mr. Nealmale24.01037656616.1000NaNS
66366403Coleff, Mr. Pejumale36.0003492107.4958NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
80180212Collyer, Mrs. Harvey (Charlotte Annie Tate)female31.011C.A. 3192126.2500NaNS
79879903Ibrahim Shawah, Mr. Yousseffmale30.00026857.2292NaNC
43243312Louch, Mrs. Charles Alexander (Alice Adelaide Slow)female42.010SC/AH 308526.0000NaNS
32832913Goldsmith, Mrs. Frank John (Emily Alice Brown)female31.01136329120.5250NaNS
68668703Panula, Mr. Jaako Arnoldmale14.041310129539.6875NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
40140203Adams, Mr. Johnmale26.0003418268.0500NaNS
41041103Sdycoff, Mr. TodormaleNaN003492227.8958NaNS
54754812Padro y Manent, Mr. JulianmaleNaN00SC/PARIS 214613.8625NaNC
959603Shorney, Mr. Charles JosephmaleNaN003749108.0500NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.